IDEA, full name as "Interactive Differential Expression Analyzer", is an online analysis and visualization platform for differential feature expression analysis of read count data on foundation of R, Shiny and JavaScript. In IDEA, five R packages, DESeq, edgeR, NOISeq, PoissonSeq and SAMseq are provided for counts data analysis.
The tips will be shown once the user moves the cursor to the charts, icons or question marks as shown in Figure 0-1 (upper) and Figure 0-2 (lower). For charts with interactive option, the option panel will appear with the chart by default, and it can be closed and reopened by clicking the gear icon as shown in Figure 0-2. Moreover, every figure shown in the page can be downloaded separately by clicking the download icon (Figure 0-2) on each figure.
In this module, users need to choose the experiment type and upload the data in specific format.
Here we provide an example dataset. By clicking Example, you will see the data information of the example. The design matrix and count matrix data of the example is available for download respectively as shown in Figure 1-2.
For experiment types, IDEA can work with 3 kinds of experiment types: experiment with standard comparison, experiment of multi-factor design and experiment with no replicates (NOT RECOMMANED).
"Standard Comparison" supports experiments with only one factor, several conditions and several replicates (example shown in Table 1-1).
The data matrix should be in .csv or .txt format, with genes as row and samples as column, and upload raw counts only.
If you check the Header box, the first row and column of the matrix will be considered the header by default. For Separator, it is accessible to use either comma (,), semicolon (;) or tab (tabular). If your data has quote, choose the Quote option precisely, otherwise, choose "None".
Also, it is optional to input the "gene length file" with features names as column1 and the length as column2, the header and separator should be the same as the countable.
The conditions of the experiment should also be inputted as .csv or .txt format, and the conditions will automatically appear on the page for chosen. For multi-condition experiment, choose only two compared conditions to call differential expression features, or by default, the first two conditions will be chosen as the compared conditions. In Multi-factor Design, factor of interest is set as the first column of your design matrix.
By clicking "View uploaded data", the "Data Information Table" will show the uploaded read counts table. The order of the table can be changed by clicking the header of the column. The number of the features shown on each page is also changeable on the top left.
The "Data" module does the job of data normalization, data exploration and quality control.
The process of normalization attempts to settle the problem of various factors (nucleotide composition of features, library preparation and sequencing platform etc.) which can bring bias into number of reads in read count data. 3 methods are provided to normalize sample data (Table 2-1).
| Abbreviation | Full Name | Method Details | | :----------: | :-----------------------------------: | :----------------------------------------------------------: | | RPKM | Reads Per Kilo Base per Million Reads | Divide gene count by the total number of reads in each library or mapped reads | | UQ | Upper Quartile | Sum gene counts up to the upper 25% quartile to normalize | | TMM | Trimmed Mean of M | Compute a scaling factor as weighted means of log ratios between two experiments after excluding most expressed and genes that have large log ratios in expression |
Choose the normalization method on the panel before doing anything else. If normalization is unnecessary, choose "None".As default, the Upper Quartile method is chosen.
Different figures and tables may have specific settings. See more options and instructions in particular charts, such as Stacked Density Plot, Heat Map of Sample Distance or Correlation Analysis.
See more instructions of charts and details in Report.
Samples boxplot visualizes count distribution for all samples, showing features in expression distribution in each sample.
Stacked density plot visualizes density distribution of features with different read counts, showing overall condition of read counts data normalized counts. For interactive option, click the input box to add the samples and click Submit to plot, delete the samples by clicking the cross near the sample.
Ratio bar plot visualizes distribution of counts in each samples using stacked bar. Low counts may introduce noise and interfere extraction of differential expression of features.
The heat map of sample distance visualizes the Euclidean distance between the samples, giving an overview of similarities of sample heat map. The heat map color can be changed in interactive option. Attention, if the samples have a clear classification, this plot may only have two colors, as shown in Figure 2-5.
A principal component analysis (PCA) plot visualizes the affection of the first two principal components. It is optional to show the labels of data in the figure on the interactive option panel.
With scatter plot, the correlation analysis visualizes Spearman's correlation of feature expression between two selected samples. Spearman correlation coefficient is shown in the Interactive Option panel of correlation analysis. The sample plotted on the x and y axis can also be changed on the panel.
Figure 2-7 Scatter plot and correlation analysis and interactive optionA histogram with error bar visualizes the comparison result of expression level of a certain feature, selected by user in interactive option, in different conditions. The mean, standard deviation and standard error of the chosen feature is also shown in interactive option panel. The Report will only show the figure of feature chosen here.
The normalized data in .csv format and the report of all charts can be downloaded on the panel.
The "Analysis" module is divided into two parts: the packages analysis part and the combination part. In packages analysis part, we provide five methods for analyzing differential expression of features. Here we simply introduce the basic feature of every method. The summary table is given below (Table 3-1).
| Package | Version | Normalization (default) | Model of Reads Count Distribution | Differential Expression Test | FDR Control | Standard Comparison | Multi-factor Design | Without Replicates | | -------------- | ------- | ---- | ---- | ---- | :----: | :---- :| :-----------------------------------: | :---- :| | DESeq2 | 1.6.2 | sizeFactors | Negative binomial distribution | Wald test, LRT | Benjamini-Hochberg procedure | | | | | edgeR | 3.8.3 | TMM | Negative binomial distribution | Fisher's exact test | Benjamini-Hochberg procedure | | | | | NOISeq | 2.8.0 | RPKM | Nonparametric method | P-value for empirical distributions | Not applicatable | | | | | PoissonSeq | 1.1.2 | Goodness-of-fit estimate | Poisson distribution | Score statistics | A permutation plug-in approach | | | | | SAMseq (samr) | 2.0 | Subsampling method | Nonparametric method | Wilcoxon test | A permutation plug-in approach | | | |
Be sure of your experiment type. The multi-factors design can only use DESeq2 and edger packages, the PoissonSeq or SAMseq is not available for experiment with no replicates.
Click the icon of one package and click START, the charts of this package will show on the page.
Different packages have different visualization. Table 3-2 is a summary of charts in different packages.
| Chart/Plot Type | DESeq | edgeR | NOISeq | PoissonSeq | SAMseq | |:-:|:-:|:-:|:-:|:-:|:-:| | Differerential Expression | | | | | | | Features Table | | | | | | | MA-Plot | | | | | | | Normalized SizeFactors | | | | | | | Volcano Plot | | | | | | | Heat Map | | | | | | | FDR/P-value/Probabiltiy | | | | | | | Distribution | | | | | | | Variance Estimation | | | | | | | Power Transformation Curve | | | | | | | Q-Q Plot | | | | | |
See more instructions of charts and details in Report.
The order of the table may changed by clicking any names on the first line. The number of features shown in one page can be changed on top left. The search function can search the data on either column.
The interpretation of the columns in all packages is shown in the table below.
In MA-Plot, the data is been transformed onto the M (fold change or log ratio) and A (average expression of a feature) scale, which can give users a quick overview of the distribution of data. The false discovery rate (FDR) threshold can be changed, and the features are colored red if the adjusted p-value is less than the FDR, while other features are colored blue.
Since different samples may have different sequencing depth, it is necessary to put every count value to a common scale in order to make them comparable.
In edgeR table, group represents conditions, lib.size represents size of the library, norm.factors is the normalized size factors.
An overview of the number of differential expression features can be shown in the volcano plot. The threshold of both axes can be changed on the Interactive Option panel. Highly differential expressed features are colored blue, while others are in red.
By using a color scale, heat map can display the expression values of the features, and every rectangle represents one feature – sample pair. By default, we display the 30 most highly expressed features and this number is changeable on the option panel. In addition, the scale method (normalize data in row or column), clusters of row /column and colorkey is also changeable on the panel.
FDR/P-value distribution plot visualizes distribution of FDR or P-value in differential expression test provided in analysis packages using histogram plot. Specially, NOISeq uses the q-value (standard comparison) or prob(without replicates) to form the distribution plot.
The dispersion estimates plot is for checking the result of dispersion estimates adjustment. The feature-wise estimates are in black, the fitted estimates are in red, and the final estimates are in blue. The outliers of feature-wise estimates are marked with blue circles. The points lying on the bottom indicates they have a dispersion of practically zero or exactly zero.
The variance estimation plot has average log CPM (counts per million) as x-axis and biological coefficient variation as y-axis. The red dots represent the common dispersion and the black dots represent the tag-wise (feature-wise) dispersion.
Power transformation curve is for estimating the best parameter for minimizing overdispersion of data. It plotted one over theta on y-axis and mean log mu on x-axis. See more details on the Report.
The Q-Q plot, also called the SAM plot in SAMseq, is a scatter plot with dots representing features. The positive significant features, which means the features has higher expression correlates with higher risk, are in red, and negative significant features are in green, while others are in black.
The markdown files of every package are provided as the analysis report and can be downloaded on the panel. Notice that each package will generate the specific report.
Also, the differential features table (.csv format) can be downloaded separate from other charts by "Download .csv file".
By clicking the icon in figure 3-11, the user will enter the "Combination" mode. "Combination" module provides a collection and comparison of the prior using packages. Figures like Venn and bar plot give an intuitive impression of the different result made by each package. We also define a new argument called R-value to synthesis the results of differential expression features from these packages.
It is available for users to choose the packages needed to analysis on the "Advanced Option" panel.
This plot shows a comparison of the number of differential expression features identified by each package. The different of the results are caused the differences of the algorithm of these packages. The counts of features are plotted on the y-axis and each bar represents one package.
The Venn diagram visualizes the overlapping differential expression features identified by each package. In the diagram, each oval represents one package, and the number shown in the diagram means the number of differential expression features.
This table shows the identification details of every feature, and the order of the table may changed by clicking any names on the first line. The number of features shown in one page can be changed on top left.
The interpretation of some columns is shown in the table.
| Package | Header | | :-------: | ------ | | FeatureID | Feature identifier | | Mean | Mean of expression | | LogFC | Logarithm of the fold change | | Rankmean | Mean rank of the five packages | | Score | Intergration score of rank lists of DE features by robust rank aggregation (RRA) |
As other modules, the report of "Combination" is also available for download. The .csv format result is the result of the "Feature Weight Table".
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.